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Abstract 

Background: The packaging of long chromatin fibers in the nucleus poses a major challenge, as it must fulfill both 
physical and functional requirements. Until recently, insights into the chromosomal architecture of plants were 
mainly provided by cytogenetic studies. Complementary to these analyses, chromosome conformation capture 
technologies promise to refine and improve our view on chromosomal architecture and to provide a more 
generalized description of nuclear organization. 

Results: Employing circular chromosome conformation capture, this study describes chromosomal architecture in 
Arabidopsis nuclei from a genome-wide perspective. Surprisingly, the linear organization of chromosomes is 
reflected in the genome-wide interactome. In addition, we study the interplay of the interactome and epigenetic 
marks and report that the heterochromatic knob on the short arm of chromosome 4 maintains a pericentromere-like 
interaction profile and interactome despite its euchromatic surrounding. 

Conclusion: Despite the extreme condensation that is necessary to pack the chromosomes into the nucleus, the 
Arabidopsis genome appears to be packed in a predictive manner, according to the following criteria: heterochromatin 
and euchromatin represent two distinct interactomes; interactions between chromosomes correlate with the linear 
position on the chromosome arm; and distal chromosome regions have a higher potential to interact with other 
chromosomes. 



Background 

In eukaryotic nuclei, chromosomes of considerable length 
are densely packed into a very small volume. In Arabidopsis, 
chromatin with a total length of about 8 cm has to be 
packaged into a nucleus of about 70 [im^ volume and 5 [im 
diameter [1,2]. Nonetheless, the extremely dense packaging 
of chromatin does not lead to a chaotic entanglement of 
chromatin fibers. Eukaryotes have evolved mechanisms 
to untangle chromatin and to organize the nucleus into 
structural domains, facilitating chromosome packaging 
and, hence, the accessibility of the information stored 
within chromosomes. Therefore, chromosomal architecture 
is likely to influence the transcriptional state of a given cell, 
and might be a major player in the epigenetic regulation 
of cell fate. 
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Over the past 15 years, the field of epigenetics has 
grown rapidly, addressing basic questions about the 
long-term regulation of genes, and how diverse cell types 
reach their differentiated states. These studies have 
provided insights into the mechanisms that enable cells to 
differentiate into diverse cell types with distinct phenotypes, 
despite sharing exactly the same genotype. 

To date, most of the commonly studied epigenetic 
processes have been shown to involve covalent modifica- 
tions of DNA, such as cytosine methylation, modifications 
of the core histone proteins H3 and H4, and histone vari- 
ants. Thereby, chromatin can be grouped into activating 
and repressive chromatin states, defined by their epigenetic 
landscape. Among the main players are trimethylation of 
lysine 36 of H3 (H3K36me3) and dimethylation of lysine 
4 of H3 (H3K4me2), which act as activating marks, and 
monomethylation of lysine 27 of H3 (H3K27mel) and 
dimethylation of lysine 9 of H3 (H3K9me2), which are 
associated with the repressive state [3-5]. 
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Although studied for over 100 years [6] (for example, 
with respect to cell division), chromosomal architecture, 
and thus higher-order chromatin organization, has not 
been a major focus of epigenetic research. Until recently, 
the lack of high-resolution techniques made structural stud- 
ies of the nucleus extremely difficult. Nevertheless, chroma- 
tin condensation as seen in heterochromatin, reflecting, 
chromosomal architecture, could be viewed as the first de- 
scribed epigenetic mark [7,8]. Recently, it became possible 
to study chromosomal architecture in more detail, on both 
a global and a local scale, for instance with respect to phys- 
ical interactions between enhancers and promoters [9,10]. 

In plants, chromosomal architecture has been studied for 
many years using cytogenetic techniques and microscopic 
observations. Early studies allowed the discovery of the 
basic chromosome conformations, heterochromatin and 
euchromatin, which were first described in mosses by 
Emil Heitz as early as 1929 [7]. Most condensed chromatin, 
or heterochromatin, is associated with centromeric regions. 
However, large heterochromatic regions outside the peri- 
centromeres were also detected and, because of their 
microscopic appearance, were termed 'knobs'. Although 
first observed and best described in maize [11], knobs 
were also shown to exist in the model plant Arabidopsis, 
on chromosomes 4 and 5 [12-14]. The heterochromatic 
knob on the short arm of chromosome 4 {hk4s) is derived 
from an inversion event, which caused a pericentromeric 
region to lie in a more centrally located region of the 
chromosome arm. Owing to its length of 750 kb, hk4s is 
easily detectable, and is therefore the best studied knob in 
Arabidopsis, By contrast, the merely 60 kb long knob on 
chromosome 5 is only poorly described. Despite its central, 
and therefore euchromatic, position on the chromosome 
arm, hk4s has kept the heterochromatic features of its 
pericentromeric origin. The knob h4ks is characterized by 
low gene density and an abundance of highly repetitive 
sequences, such as transposable elements. 

To date, two methods have been frequently used to 
study chromosomal architecture. For microscopic observa- 
tions, fluorescence in situ hybridization (FISH) visualizes 
chromosomal architecture by detecting specific sections 
of chromosomes through hybridization with fluorescently 
labeled probes. Over the past decade, a completely different 
set of methods has been developed, which are summarized 
as chromosome conformation capture (abbreviated to 3C) 
technologies [15,16]. 3C uses formaldehyde cross-linked 
chromatin that is subsequently digested and religated. 
This produces circular DNA, comprised of two restriction 
fragments that were initially in close spatial proximity 
within the nucleus. The abundance of these circular 3C 
templates can then be used to calculate interaction frequen- 
cies between two given fragments in the genome. In both 
animal model systems and yeast, various studies have 
successfully used 3C technologies since the first publication 



in 2002 [15]. Whereas 3C is used to analyze pair-wise 
interactions (one specific fragment interacting with another 
specific fragment; that is, one to one), circular chromo- 
some conformation capture (4C) identifies interactions 
genome-wide to a viewpoint of interest [17] (that is, one 
to aU). HiC, the most recent 3C technology, facilitates the 
analysis of genome-wdde interactions from all restriction 
fragments of a genome (that is, aU to aU) [18]. 

In the plant field, however, the adoption of these tech- 
nical advances has been slower, and only a few studies 
have been performed using 3C technology. A 3C study in 
maize revealed chromatin looping at the paramutagenic 
bl locus [19], and another recent study showed the im- 
portance of local DNA looping for the correct expression 
of the flowering time regulator locus FLC [20]. Moissiard 
and colleagues compared global changes in the interac- 
tome between mutant atmorc6 and wild- type plants [21]. 
However, that study did not focus on a detailed description 
of the chromosomal architecture of Arabidopsis nuclei. 

Here, we provide insights into the general architecture 
of the Arabidopsis nucleus, using 4C applied to several 
viewpoints followed by Illumina sequencing. Our study 
aimed at characterizing global principles of chromosomal 
interactions and their correlations with epigenetic marks. 
Additionally, we found that the heterochromatic knob hk4s 
is characterized by a distinct interactome, which strongly 
resembles its pericentromeric origin. 

Results 

The current knowledge on chromosomal architecture in 
Arabidopsis is largely based on microscopic observations. 
Therefore, we aimed to gain insights into higher-order 
chromatin organization based on 4C technology, which 
promises to complement previously published FISH experi- 
ments, and to reveal novel mechanisms governing chromo- 
somal architecture. 

We performed 4C experiments on aerial tissue of 2- 
week-old Arabidopsis seedlings using thirteen specific 
restriction fragments (viewpoints) distributed across aU five 
chromosomes (Figure lA). Employing high-throughput 
sequencing, 4C technology identifies sequences that phys- 
ically interact with a given viewpoint. Therefore, the 
position and number of mapped 4C sequencing reads 
define the interactome of the given restriction fragment 
(that is, the viewpoint) in space (position) and in frequency 
or specificity (number of reads). 

To cover a wide distribution of chromosomal inter- 
actions, we chose viewpoints that reside in various loca- 
tions: from pericentromeric, to mid-chromosome arm, to 
distal positions (Figure lA). 

Data evaluation reveals robustness of 4C experiments 

To obtain the interactome of a given viewpoint, short 
sequence reads were mapped to restriction fragments. 
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Figure 1 Primary circular ciiromosome conformation capture (4C) data analysis. (A) Schematic representation of the viewpoints chosen for 
this study. Viewpoints were named according to nearby genes or according to a region of special interest {hk4s). (B) Cluster analysis representing 
the reproducibility of biological duplicates. The letters 'A' and 'B 'at the end of the names indicate biological replicates. (C) Power law scaling, 
indicative of the interaction decay for all viewpoints, across a distance to the viewpoint from 1 kb to 10 Mb. 



and subsequently merged into sliding windows consisting 
of 100 Hindlll restriction fragments. We then assigned 
P-values to each window describing the specificity of the 
interaction to a given viewpoint. To obtain these P-values, 
read counts of 4C windows were compared with the prob- 
abilities of a normal distribution. The parameters of this 
distribution were calculated using 1,000 sets of windows, 
each generated by random shuffling of 4C fragments. 
As chromosome arms differ considerably in their length 
and, therefore, their DNA amount, we calculated P-values 
individually for each chromosome arm. Windows with 
P < 0.01 where defined as specifically interacting with their 
corresponding viewpoint and are, hereafter, referred to as 
'preys'. 

The mappability of sequencing reads poses a major 
concern for any genomic study. Owing to the incomplete 
assembly of centromeric repeats in the Arabidopsis refer- 
ence genome, we excluded regions within 100 kb distance 
of the centromere. Visual inspection of genomic lUumina 
sequencing data revealed an even distribution of mapped 



reads along the remaining chromosome sequence and, 
therefore, no other major mappability biases were identified. 

To assure the reproducibility of this study, 4C experi- 
ments were performed in duplicate. Correlations between 
duplicates and different viewpoints were calculated using 
the sum of reads per window. Spearman correlation coef- 
ficients were high for duplicates (mean ± SD 0.88 ± 0.07), 
and relatively low for different viewpoints (0.26 ±0.31). 
However, interacting viewpoints and viewpoints located in 
close proximity (see Figure lA), such as the two viewpoints 
at the MEDEA {MEA) locus, had correlation coefficients 
close to those of replicates of the same viewpoint. Cluster 
analysis supported these findings (Figure IB), further 
demonstrating that viewpoints on the same chromosome 
arm also show higher correlations with each other than 
with viewpoints located on other chromosomes arms. 
Taken together, these analyses reveal the robustness of 
our data. 

To differentiate between random interactions, which 
are mainly dependent on chromosomal proximity to 
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the viewpoint, and specific interactions, we estimated the 
genomic distance-dependent decay of the interaction 
probabiUty on a distance of 1 kb to 10 Mb from the 
viewpoint. For this, we pooled 4C reads of all viewpoints 
within the given distance to their viewpoints. Performing 
linear regression on logarithmized distance and contact 
probabilities, we calculated a slope of -0.73, that is, the 
contact probability decays with a power law function of 
distance'^'^^ (Figure IC). This result resembles similar 
analyses of the Drosophila (-0.85) [22] and human (-1.08) 
[18] genomes. 

C/s interactions are enriched within chromosome arms 

Because the repUcate correlation was high, we pooled 
replicates for a common representation of the 4C interac- 
tome (Figure 2A,B) using the software Circos [23]. 
Figure 2C illustrates an example of a more detailed 
representation of 4C interactomes for the FIS2 viewpoint. 
All other representations of individual viewpoints are shown 
in the additional files (see Additional file 1: Figure SI; 
Additional file 2: Figure S2; Additional file 3: Figure S3; 
Additional file 4: Figure S4; Additional file 5: Figure S5; 
Additional file 6: Figure S6; Additional file 7: Figure S7; 
Additional file 8: Figure S8; Additional file 9: Figure S9; 
Additional file 10: Figure SIO; Additional file 11: Figure Sll; 
Additional file 12: Figure S12; Additional file 13: Figure 
S13). At first sight, we observed an apparent enrichment in 
inter-chromosomal interactions of distal regions of chro- 
mosomes (Figure 2A). Additionally, intra-chromosomal 
interactions appeared to be occurring mostly locally around 
the viewpoint and between the distal regions of the two 
chromosome arms (Figure 2B and Figure 2C). 

Interactions can be categorized into cis and trans inter- 
actions, which require different analysis techniques [24]. 
Cis interactions (Figure 2B) refer to intra-chromosome 
interactions, whereas trans interactions (Figure 2A) are 
defined as inter-chromosome interactions. 

By visual inspection of the interaction frequencies, we 
observed that local interactions rarely spread across the 
centromeres, (Figure 2B, Figure 2C; see Additional file 1: 
Figure SI; Additional file 2: Figure S2; Additional file 3: 
Figure S3; Additional file 4: Figure S4; Additional file 5: 
Figure S5; Additional file 6: Figure S6; Additional file 7: 
Figure S7; Additional file 8: Figure S8; Additional file 9: 
Figure S9; Additional file 10: Figure SIO; Additional file 11: 
Figure Sll; Additional file 12: Figure S12; Additional 
file 13: Figure S13), indicating that interactions between the 
two arms of the same chromosome (that is, the inter-arm 
interactions) are distinct from the intra-arm interactions, 
thus splitting the cis interactions into two groups. 

Therefore, we investigated whether chromosomes, or 
rather chromosome arms, are the basic unit of nuclear 
architecture. To answer this question, we calculated 
the average number of reads per million (RPM) for each 



chromosome arm, and defined three chromosome arm 
types: The chromosome arm hosting the viewpoint 
(viewpoint arm), the other arm on the same chromosome 
as the viewpoint {cis arm), and arms of all other chromo- 
somes {trans arms). We observed the highest interaction 
frequencies and, therefore, the highest mean RPM values 
within the viewpoint arm (Figure 3A), showing that a high 
proportion of chromosomal interactions occur within the 
same arm. 

Interactions with cis arms were significantly more 
frequent than those with trans arms (Students ^-test, 
P = 0.0135 for replicate A and P = 0.0129 for replicate 
B). However, the differences were small compared with 
the RPM values for the viewpoint arm and the cis arm 
(Students /:-test, P= 1.4 x 10"^^ for replicate A and P = 
1.7 X 10'^^ for replicate B) (Figure 3A). A large proportion 
of interactions within the viewpoint arm occurred within 
the close vicinity of the viewpoint itself To investigate 
whether long-range interactions also preferentially occur 
within the viewpoint arm, we excluded regions surrounding 
the viewpoints by 2 Mb on each side of the viewpoint 
(Figure 2A). Devoid of the viewpoint region, the RPM 
values were strongly reduced; however, they were still 
significantly higher than those of the cis arms (Student s 
^-test, P = 0.012 for replicate A and P = 0.010 for replicate B). 

The difference between the trans and cis arms appears 
to be dependent on the distance of the viewpoint from the 
centromere. Distal viewpoints (for example, MEA and 
CYTOKININ-INDEPENDENTl {CKIl), see Additional 
file 1: Figure SI; Additional file 2: Figure S2; Additional 
file 6: Figure S6) did not appear to interact preferentially 
with their respective cis arm compared with the trans 
arm. This could been observed by comparing the overall 
interaction values of the viewpoints respective cis arm 
compared with the overall interaction values of the trans 
arms. By contrast, viewpoints residing in the vicinity of 
the centromeres (for example, YAOZHE {YAO) and 
AT3G44380; see Additional file 7: Figure S7; Additional 
file 10: Figure SIO) exhibited increased cis arm interactions 
compared with trans arm interactions and, thus, limited 
spreading of local interactions across the centromere. 

In summary, intra-arm interactions were about ten-fold 
more frequent than inter- arm interactions, whereas inter- 
arm and inter-chromosomal interactions differed by about 
two-fold on average. Therefore, our results show that 
chromosome arms are the main interaction unit, and 
that interaction frequencies decrease sharply close to the 
centromeres. 

Linear position along the chromosome influences the 
interaction potential of the viewpoint 

We found that trans interactions could make up to 50% 
of the total interactome of a given viewpoint. Therefore, 
we were interested in understanding the mechanisms 
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(See figure on previous page.) 

Figure 2 Summary of circular chromosome conformation capture (4C) interactomes. Circos plots illustrate the 4C interactome, transcription 
rate, and chromosomes with euchromatic and centromeric regions. Line color refers to the color of the viewpoint names at the periphery of the 
Circos plots. Only interactions with a P< 10"^ are plotted. (A) Irons- interactions; (B) cis interactions; (C) 4C interactome of viewpoint FIS2. Color 
code refers to significance levels. Gene density (blue circles) and transposable element density (purple circles) are indicated to illustrate the 
occurrence of heterochromatin and euchromatin. The region covered by the knob hk4s is highlighted with a transparent rectangle on the short 
arm of chromosome 4. Interaction values equal to Ii(log2(number of reads in fragment])), where i stands for a fragment within a given window, 
are scaled to the viewpoint's total library size. 



governing trans interactions. Visual inspection of 4C data 
(Figure 2A, Figure 2C; see Additional file 1: Figure SI; 
Additional file 2: Figure S2; Additional file 3: Figure S3; 
Additional file 4: Figure S4; Additional file 5: Figure S5; 
Additional file 6: Figure S6; Additional file 7: Figure S7; 
Additional file 8: Figure S8; Additional file 9: Figure S9; 
Additional file 10: Figure SIO; Additional file 11: Figure 
Sll; Additional file 12: Figure S12; Additional file 13: 
Figure SI 3) suggested an effect of the viewpoint positions 
along the chromosome arms on the trans interaction fre- 
quencies. We hypothesized that chromosomal interactions 
do not solely reflect specific functions of a given region, 
but are rather a consequence of physical constraints. To 
investigate whether the positioning of the viewpoints along 
the chromosome arm is a major constraint for trans inter- 
actions, we tested whether regions with similar distance to 
the centromeres are more likely to interact. 

We calculated the relative distance to the centromeres, 
where 50% (disto.s) of all 4C reads could be found. As a 
considerable proportion of all interactions could be found 
surrounding the viewpoint and would therefore distort 
the analysis, we excluded the viewpoint arm. A significant 
correlation between disto.s and the relative distance of 
the viewpoint to the centromere could be observed 
(Spearman correlation coefficient = 0.722; Unear model 
P = 3.4 X 10'^^) (Figure 3B). This suggests that regions 
with a similar relative distance to their corresponding 
centromeres are likely to co-localize with each other in the 
three-dimensional space of the nucleus. This observation 
was most pronounced in distal regions; however, it was also 
observable in regions in proximity to the pericentromeres. 

Distal chromosomal regions show an increased trans 
interaction potential 

We hypothesized that the flexibility of a chromosome arm 
is a major physical constraint influencing the interaction 
potential of a viewpoint. Assuming that centromeres act 
as chromosomal anchors, distal regions of chromosome 
arms should exhibit a higher flexibility than regions close 
to the centromere [25-28] . Hence, we predicted that distal 
viewpoints should exhibit an increased trans interaction 
potential. 

Therefore, we tested the correlation between the absolute 
distance of the viewpoint to the centromere and the reads 
per kilobase per million (RPKM) of 4C reads found in trans 



(including the cis arm) (Figure 3C). Distal viewpoints 
were shown to interact more frequently with regions in 
trans than did viewpoints residing closer to the centromere 
(Spearman correlation coefficient = 0.774, linear model 
P=10-^) (Figure 3C). 

These results indicate that the localization of a viewpoint 
along the chromosome arm significantly influences its 
interaction pattern. 

Principal component analysis showed a correlation 
between the epigenetic landscape and the interactome 

The interplay of epigenetic marks, such as histone modifi- 
cations, and physical interactions of two sequences 
were previously shown to be important for stringent 
gene regulation [20,22,29,30]. Therefore, we investigated 
whether specific epigenetic marks can be correlated with 
long-range interactions. 

We obtained previously published histone modification 
data [31], specifically H3K4me2, H3K4me3, H3K9me2, 
H3K27mel, H3K27me3, H3K36me2, H3K36me3, H3K9ac, 
and H3K18ac. From the same dataset, we included 
transcriptome, histone H3 occupancy, and genomic DNA 
control data. Additionally, we obtained publically available 
CG, CHH, and CHG DNA methylation data [32]. Because 
data obtained from chromatin immunoprecipitation (ChIP) 
for histone modifications cannot be directly compared with 
4C data due to the different scaling of the two datasets [24], 
we calculated density values of each epigenetic feature 
within 4C windows. We analyzed the epigenetic modifica- 
tion densities (EMDs) as the sum of nucleotides covered by 
at least one uniquely alignable short sequence, divided by 
the total number of nucleotides for each individual 4C 
restriction fragment (that is, the length of the restriction 
fragment). Subsequently, the mean for each window was 
calculated. To adjust the scale of the 4C data to the EMDs, 
we chose a window size of 25 fragments, which still con- 
ferred satisfactory reproducibility between replicates. 4C 
windows were categorized into prey regions (windows that 
show an interaction probability of <0.01) and randomly 
chosen control regions. 

If specific histone modifications or sets of histone modi- 
fications are associated with an interaction pair, it could be 
assumed that prey regions of a given viewpoint would 
share a common epigenetic environment, reflected by a 
particular composition of the EMDs. To elucidate how 
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Figure 3 Physical constraints of ciiromosomal arciiitecture. (A) 

Number of reads per million for four distinct classes of interactomes. 
Viewpoint: circular chromosome conformation capture (4C) reads 
that map on the same chromosome arm as the viewpoint. 
Viewpoint (removed): interactions mapping the viewpoint's arm, 
excluding interactions that map within 2 Mb distance on either side 
of the viewpoint. Cis: 4C reads that map to the other arm of the 
chromosome harboring the viewpoint. Irons: 4C reads that map to 
all other chromosome arms. (B) The relative distance to the 
centromere (0 at the centromere, 1 at the telomere) in which 50% 
of the 4C reads can be found depends on the relative distance of 
the viewpoint to the centromere. (C) The percentage of 4C reads 
that can be mapped to trans arms was positively correlated with the 
viewpoint's absolute distance to the centromere in base pairs (bp). 
In all parts, red circles represents replicate A, blue represents 
replicate B. 



histone modifications are related to the interactome, we 
performed principal component analysis (PCA) (Figure 4A). 
For each viewpoint, the mean EMDs (selecting only histone 
modification data) of prey and control regions were 
calculated and included in the PCA. As the first principal 
component was found to explain 97% of the total variation, 
it was the only component used for further analyses. 

Two opposing groups of EMDs, H3K36me3/H3K4me2 
and H3K27mel/H3K9me2, were found to be the major 
contributors to the first principal component of the PCA 
(Figure 4A, arrows). Closer observation of three viewpoint/ 
prey pairs revealed how EMDs and interaction frequencies 
are coupled (Figure 4C). Euchromatic viewpoints, such 
as FLOWERING WAGENINGEN (FWA) (Figure 4C, top 
row), which are characterized by low levels of H3K27mel 
and enrichment of H3K36me3, preferentially interacted 
with regions of a similar EMD pattern. This is evident 
from the increased H3K36me3 levels surrounding the 
region of high interaction frequencies and local peaks of 
H3K27mel enrichment, coinciding with a significant drop 
in interaction frequencies (Figure 4C, top row, right panel). 
By contrast, heterochromatic viewpoints (Figure 4C, middle 
and bottom rows), which are characterized by the inverse 
EMD composition, preferentially interacted with regions 
exhibiting low H3K36me3 and high H3K27mel levels. 
For example, local enrichment of H3K27mel coincided 
with increased interaction frequencies to PHEl (Figure 4C, 
middle row, right panel). Moreover, the asymmetric local 
interactions surrounding hk4s appeared to be reflected 
by the asymmetric distribution of H3K27mel (Figure 4C, 
bottom row, left panel). 

Additionally, we performed PCA separately for indi- 
vidual viewpoints (see Additional file 14: Figure S15). 
Although the same EMDs could be identified as major 
factors for most viewpoints, the first component of the 
PCA was less dominant, indicating a more complex collab- 
oration of factors separating control regions from prey 
regions. Furthermore, various viewpoints did not show 
a very clear separation of prey and control regions. 
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Figure 4 Crosstalk of epigenome and interactome. (A) Principal component analysis (PCA) using mean epigenetic modification densities 
(EMDs) of control and prey regions for each viewpoint. EMDs included in the PCA were: H3K4me2, H3K4me3, H3K9me2, H3K27mel, H3K27me3, 
H3K36me2, H3K36me3, H3K9ac, and H3K18ac. Colored arrows represent the two highest contributing EMDs to the variance of the first 
component in positive and negative direction, respectively. Note that the first principal component explains almost all the variance (97%), and 
therefore, this was the only component plotted. Prey regions are represented by green dots, control regions by yellow dots. (B) Mean CG 
methylation densities of prey and control regions for individual viewpoints. The mean was calculated across 1000 times randomly sampled 50 
prey and 50 control regions, respectively. Green bars represent preys and yellow bars represent controls. (C) Examples of the interactome- 
epigenome interplay for three different viewpoints and one of their corresponding prey regions. Top track: log summed 4C reads per window 
(100 fragments, starting every fragment). 4C reads of replicate A are plotted in the positive intercept, and 4C reads of replicate B are plotted in 
the negative intercept. Middle Track: EMD of the highest contributing factors of the PCA in positive and negative direction, respectively. In order 
to achieve comparable representation of H3K36me3 and H3K27mel densities, the density of every window (25 fragments, starting every 5 
fragments) was divided by the mean density of each histone modification. Arrowheads point at regions where the 4C interactome and local 
EMD peaks appeared to correlate. FWA: viewpoint on chromosome 4, 12 to 14 Mb; prey on chromosome 5, 23 to 25 Mb. PHE: viewpoint on 
chromosome 1, 23.5 to 25.5 Mb; prey on chromosome 1, 20 to 22 Mb. hk4s: viewpoint on chromosome 4, 0.8 to 2.8 Mb; prey on chromosome 2, 
4 to 6 Mb. 



Interestingly, this was most evident for viewpoints 
whose preys are associated with heterochromatic marks 
{PHERESl (PHEl), hk4s, AT1G51860) (see Additional 
file 14: Figure S15). 

To address the individual contribution of epigenetic 
marks to the interactome, we performed a test based 
on a modified Gene Set Enrichment Analysis (GSEA) 
[33]. In summary, we tested whether prey regions would 
show a non-random distribution in their EMD profiles 
(see Materials and Methods for a detailed description). 
The obtained empirical P-values are indicative of the 



likelihood of a random set of regions to show a similar 
distribution of EMD values as the tested prey regions 
(Table 1). 

To independently investigate whether control and prey 
regions differ significantly for individual epigenetic fea- 
tures, we developed a permutation test. In the first step, 
we calculated for each viewpoint the mean density for 
each epigenetic feature (Figure 4B and Additional file 15: 
Figure S16). Epigenetic features that coincide with the 
occurrence of heterochromatin and euchromatin, such 
as DNA methylation, clearly split the viewpoints into two 
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Table 1 Analysis of the epigenetic landscape 



Genomic feature 




P-value 


a 


Permutation test 


GSEA-like test 


H3 


0.1013 




0.0779 


H3K18ac^ 


0.0335 




0.0178 


H3K27mel^ 


0.0249 




0.0084 


H3K27me3 


0.3355 




0.099 


H3K36me2^ 


0.0033 




0.0051 


H3K36me3^ 


0.0033 




0.0054 


H3K4me2^ 


0.0033 




0.0051 


H3K4me3^ 


0.0037 




0.0051 


H3K9ac^ 


0.0033 




0.0051 


H3K9me2^ 


0.0325 




0.0057 


Transcription^ 


0.0033 




0.0054 


CG metliylation replicate 1^ 


0.0065 




0.0054 


CHG metliylation replicate 1^ 


0.0083 




0.0051 


CHH methylation replicate 1^ 


0.0083 




0.0051 


CG methylation replicate 2^ 


0.0083 




0.0054 


CHG methylation replicate 2^ 


0.0087 




0.0051 


CHH methylation replcate 2^ 


0.0083 




0.0051 


Genomic DNA 


0.0871 




0.056 



^Table contains adjusted P-values (false discovery rate; FDR (Benjamini-Hochberg)) 
for genomic features tested with a permutation test or a Gene Set Enrichment 
Analysis (GSEA)-like algorithm. 

"^Genomic features differing significantly between prey and control regions 
(a = 0.05). 



groups. Whereas viewpoints such as PHEl, AT1G51860, 
and hk4s had high methylation levels in their prey regions 
and low methylation levels in control regions, viewpoints 
that occur in euchromatin showed an inverse pattern. Simi- 
lar patterning was also detectable for other epigenetic 
modifications (Figure 4B; see Additional file 15: Figure S16). 

The inverse patterning of the epigenetic landscape 
between different viewpoints made it difficult to perform 
statistical tests using EMD values directly. Therefore, 
we calculated the absolute difference in the density of the 
epigenetic features density between control and prey 
regions. In essence, we tested whether the absolute 
difference in EMD values between prey and control 
regions were significantly different from the absolute 
difference between two sets of randomly selected regions. 
As a test set, we shuffled the 50 prey and 50 control 
regions into two randomized groups. As for the prey 
and control regions, we then calculated means and subse- 
quently absolute differences between the two randomized 
groups. By repeating the permutations 1,000 times, we 
obtained a distribution of absolute differences between 
the two randomized groups for each epigenetic feature. 
This allowed us to calculate empirical P-values, which 
describe the chance that two randomly selected regions 



would differ more in their EMD setup than would prey 
and control regions (Table 1). 

In line with the previously performed PCA, both tests 
revealed that the densities of most epigenetic features 
differed significantly between control and prey regions 
(Table 1). Histone H3 occupancy, however, did not differ 
significantly between the two groups, indicating that 
histone density itself does not correlate with a viewpoints 
interactome. Additionally, no significant difference in 
genomic control data could be observed, rendering possible 
sequencing and alignment biases of the analyzed EMD 
dataset unlikely. 

In summary, we conclude that the epigenetic landscape 
coincides with the interactome. This is mainly reflected by 
distinct euchromatic and heterochromatic interactomes. 

The heterochromatic knob evades its euchromatic 
environment 

Analyzing the read numbers of a first set of 4C viewpoints, 
we consistently observed a drop in read numbers for a re- 
gion situated in the center of the short arm of chromosome 
4 (Figure 5B; see Additional file 1: Figure SI; Additional 
file 2: Figure S2; Additional file 3: Figure S3; Additional file 
4: Figure S4; Additional file 5: Figure S5; Additional file 6: 
Figure S6; Additional file 7: Figure S7; Additional file 8: 
Figure S8; Additional file 9: Figure S9; Additional file 10: 
Figure SIO; Additional file 11: Figure Sll; Additional 
file 12: Figure S12; Additional file 13: Figure S13). Unex- 
pectedly, this drop in interaction frequency was observed 
irrespective of the location of the viewpoint. Additionally, 
we did not observe this drop with visual inspection of 
genomic sequencing data, implying no mappability bias. 
Therefore, we hypothesized that global constraints of 
chromosomal architecture govern genome-wide interac- 
tions with this region. 

Exploring the region in more detail, we found that it 
corresponds to the heterochromatic knob {hk4s), which 
is cytogenetically detectable and has been described pre- 
viously [12,34] (see Additional file 9: Figure S9). 

To analyze the implications of hk4s on chromosomal 
architecture in more detail, we designed three additional 
4C assays. We set a viewpoint within hk4s and two view- 
points flanking hk4s in a more distal region {SWINGER 
{SWN)) and a more proximal region {YAO) of the short 
arm of chromosome 4. As the flanking viewpoints were set 
relatively close to hk4s, we expected increased frequencies 
of interactions within the knob and the viewpoints, owing 
to the previously observed local enrichment of interactions 
surrounding the viewpoints. However, the local interaction 
frequency of both neighboring viewpoints dropped sharply 
on the borders of hk4s (Figure 5A, Figure 5B; see Additional 
file 8: Figure S8; Additional file 9: Figure S9; Additional file 
10: Figure SIO). YAO (coordinate at 2.75 Mb) is situated 
adjacent to the border of the pericentromere (coordinates 
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Figure 5 Interactome of the knob hk4s. (A) Circos plot illustrating all cis and trans interactions of viewpoints located on chromosome 4. Only 
interactions with P< 10"^ were considered. Line color corresponds to the color of the viewpoints name indicated at the periphery of the plot. 
Chromosomes are not drawn to scale. (B) Representation of interaction frequencies for viewpoints situated on chromosome 4. Note that only the 
region up to 4 Mb is plotted, therefore, viewpoints AG and FWA cannot be seen. Black dots show positions of viewpoints; turquoise dots, genes; 
violet dots, transposable elements; light grey, euchromatic chromosomal segment; dark grey, heterochromatic chromosomal segments; dark grey 
ellipse, centromere. (C) Model of a potential mid-range chromosomal loop, connecting hk4s with the centromere of chromosome 4. 



278 to 5.15 Mb) [3]. Interestingly, the local interaction 
pattern appears to be asymmetric. We observed a loss 
of specific interactions not only along the boundary to 
the knob but also along the much closer border of the 
pericentromeric region (Figure 5B; see Additional file 
10: Figure SIO). The defined sharp boundaries for local 
YAO interactions resembled the interaction pattern of 
hk4s. Whereas YAO resides in euchromatin surrounded 
by heterochromatin, hk4s can be viewed as its counterpart, 
residing in heterochromatin but surrounded by euchroma- 
tin (Figure 5B). 

Regions situated on the long arm of chromosome 4 
{AGAMOUS (AG) and FWA) interacted strongly with 
regions surrounding hk4s, including YAO, but not with 
hk4s itself (Figure 5B; see Additional files 11: Figure Sll; 
Additional file 12: Figure S12), resembling the sharp drop 
in the interaction frequencies of 5'W7Vand YAO (Figure 5 A, 
Figure 5B; see Additional file 8: Figure S8; Additional file 9: 
Figure S9; Additional file 10: Figure SIO). 

Consistent with observations for the two flanking view- 
points, the significant local interaction frequencies of the 
viewpoint set in the center of hk4s were limited by the 
borders of the knob. Additionally, we observed strong 
interactions of hk4s with the pericentromeric regions of 



chromosome 4 and with the pericentromeres of other 
chromosomes (Figure 5A). The apparent absence of spe- 
cific interactions between hk4s and the pericentromere of 
the short arm of chromosome 4 is likely to be an artifact 
of the method used to assign P-values. Indeed, as P-values 
were calculated for individual chromosome arms, the high 
number of reads covering the viewpoint itself masks other 
regions on the same chromosome from being associated 
with low P- values. 

Discussion 

Replication and the choice of appropriate window size 
are key to ensuring robustness of 4C 

Based on a correlation analysis of biological repUcates, 
we show that 4C interaction profiles in Arabidopsis 
can be reproducibly obtained. However, reproducibility is 
dependent on the window size chosen. As chromosomal 
interactions are dynamic and partly stochastic, one single 
restriction fragment of two replicates can vary consider- 
ably in read number. Taking windows consisting of several 
fragments into account can balance this variation. As we 
were mainly interested in the global architecture of the 
Arabidopsis nucleus, we chose window sizes of up to 100 
restriction fragments. However, the resolution for studying 



Grob et al. Genome Biology 2013, 14:R129 
http://genonnebiology.conn/201 3/1 4/1 1 /R1 29 



Page 11 of 1 9 



short-range interactions is decreased by increasing the 
window size. Whereas 4C is well suited to study mid- 
range and long-range interactions in Arabidopsis, it is 
not necessarily the method of choice to study short-range 
interactions (for example, promoter/enhancer interactions). 
Regulatory sequences that are presumably involved in 
short-range interactions, such as chromatin loops, are 
often separated by less than a few kb. They are, therefore, 
difficult to analyze using 3C technologies, which rely on 
a sufficient number of restriction sites between the two 
regions of interest to confer satisfactory resolution. 

Arabidopsis and Drosophila show comparable chromatin 
compaction and genome size 

The interaction decay exponent describes the slope with 
which the interaction probability decays from the view- 
point. Therefore, it can provide an approximation of 
regional chromosomal compaction. Theoretically, a steeper 
slope indicates decreased flexibility of a given viewpoint, as 
distant regions are less likely to interact with it. Decreased 
flexibility can be interpreted as higher local chromatin 
compaction. Drosophila and Arabidopsis are similar with 
respect to chromosome number, genome size, total number 
of genes, and nuclear volume [1,35]. These characteristics 
could lead to similar constraints of chromosomal architec- 
ture. The interaction decay exponent determined in this 
study (-0.73) is close to that described earlier for Drosophila 
(-0.85) [22]. Interestingly, the interaction decay exponent 
in human nuclei is lower (-1.08), implying higher local 
compaction [18]. This observation is consistent with the 
physical characteristics of human nuclei compared with 
those in Arabidopsis and Drosophila, Although varying 
considerably, human nuclei show a lower volume/ DN A 
ratio than the nuclei in Drosophila and Arabidopsis^ 
indicating a higher global chromatin compaction [35]. 
It is important to mention, however, that interaction 
decay exponents cannot be compared very easily between 
different studies, as the calculated exponents of the power 
law scaling depend on the range of distances used for 
calculations. However, which scale best describes an 
overall distance-dependent interaction decay is a matter of 
debate. Additionally, the slope with which interactions 
decay was previously shown to vary between domains 
with different epigenetic landscapes [18,22]. We observed 
a variation in interaction decay exponents between the dif- 
ferent viewpoints, from -0.56 to -0.96 (see Additional file 
16: Figure S14). However, we could not explain these dif- 
ferences, either by the positional or by the epigenetic en- 
vironment of a given viewpoint. Therefore, the global 
distance-dependent interaction decay does not necessarily 
add to the understanding of how interaction frequencies 
decrease with distance from an individual viewpoint. 

How and whether global nuclear compaction and inter- 
action probability decay really correlate is not entirely clear. 



An exploration of the Arabidopsis linclMnc2 double 
mutant could possibly answer this question, as these plants 
were reported to exhibit increased DNA density compared 
with wild- type plants [1]. 

4C results refine the view on general chromosomal 
architecture in Arabidopsis 

The investigation of general features of chromosomal 
architecture in this study is consistent with previous 
findings studying Arabidopsis nuclei using cytogenetic 
methods [27,36]. However, 4C technology enables us to 
generate genome-wide interaction maps for various 
viewpoints and, hence, does not depend on a pair-wise 
analysis of two interacting sequences. This greatly adds to 
our understanding of general constraints on chromosomal 
architecture. 

Basic interaction units appear to be defined as chromo- 
some arms, with centromeres acting as a boundary. These 
findings are in agreement with an earlier study by Schubert 
and colleagues, reporting that chromosome arms are 
localized in distinct territories, as evidenced by FISH 
on Arabidopsis nuclei [36]. However, whether centromeres 
always act as strict boundaries cannot be conclusively 
answered, as the boundary effect of centromeres is likely 
to vary between the different chromosomes. 

We observed a strong influence of the chromosomal 
location of a viewpoint on its interaction potential. Re- 
markably, the linear organization of chromosomes was 
reflected in the overall interaction potential of a given 
viewpoint, despite the dense packaging of the genome 
in the nucleus. 

We propose that centromeres anchor the chromosomes 
in the nucleus, thereby allowing chromosome arms to 
protrude inside the nuclear volume [25-28]. The flexibility 
of chromosome arms thus increases with their length, 
allowing distant regions to interact more frequently in 
trans than more centrally located regions. Our hypothesis 
is supported by strong evidence for clustering of cen- 
tromeres and their adherence to the nuclear matrix in 
different model organisms [37-39]. Taken together, these 
findings may explain why regions with a similar distance to 
the centromeres, which act as anchor points, preferentially 
interact with each other. 

We also observed significant inter-telomeric interac- 
tions. A high interaction frequency of (sub-)telomeric 
regions in Arabidopsis was recently also shown by 
FISH [36]. In addition, previously published HiC data 
suggest increased interaction frequencies between telo- 
meres [21,38]. By contrast, telomeres and centromeres 
do not interact, indicating a strict separation of these 
two key organizational elements of Arabidopsis chro- 
mosomes. These findings are in line with previous 
studies, and may be explained by the nucleolar localization 
of telomeres [27,40]. 
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Remarkably, in Drosophila, long-range interactions seem 
to occur nearly exclusively within the viewpoints chromo- 
somal arm [30]; however, in the present study, up to 50% 
of all interactions were found to be outside this region. 
Whether this difference from Drosophila holds biological 
meaning is unclear. The presence of a higher number of 
individual cell types in the sample could theoretically 
increase the number of observable interactions, and result 
in a more complex interactome of a given viewpoint. Such 
increased complexity could thereby lead to an increased 
number of trans interactions. However, we do not estimate 
the number of cell types to be significantly different 
between the present study and the report by Tolhuis and 
colleagues, in which 4C was performed on Drosophila 
larval brain tissue [30], as the aerial seedling tissue used 
in our study is predominantly composed of mesophyll cells. 
The phase of the cell cycle might be a more important 
confounding factor. Over a cell cycle, chromosomal archi- 
tecture changes dramatically. Cells of Arabidopsis seedlings 
divide at high frequency, leading to a rather short time 
period in which cells reside in interphase. Therefore, the 
proportion of cells in specific stages of the cell cycle could 
be a major factor influencing the (average) chromosomal 
conformation of a population of cells. 

The interactome of a viewpoint is reflected in its 
epigenetic landscape 

PC A revealed two distinct groups of prey regions, which 
could be discriminated mainly by the level of H3K36me3/ 
H3K4me2 and H3K27mel/H3K9me2 densities. Interest- 
ingly, these histone modifications are commonly attributed 
to euchromatin or heterochromatin, respectively [31]. Fur- 
thermore, the heterochromatic pair H3K27mel/H3K9me2 
is described to be the major component of chromatin state 
3; which is mainly associated with transposable elements, 
as previously reported by Roudier and colleagues, whereas 
the pair H3K36me3/H3K4me2 primarily contributes to 
chromatin state 1\ associated with active genes [3]. Filion 
and colleagues describe five distinct chromatin types in 
Drosophila, distinguished by the composition of proteins 
adhering to the DNA. H3K4me2 was shown to be most 
abundant in red chromatin,' which represents one of two 
euchromatic chromatin states, whereas H3K9me2 is 
enriched in 'green chromatin^ which can best be described 
as the classic heterochromatin of pericentromeric regions 
[4]. As anticipated by previous cytological studies oi Arabi- 
dopsis nuclei, the interactome obtained by 3C technologies 
can be separated into two distinct domains, correlating 
with both the epigenetic and the cytogenetic definition 
of heterochromatin and euchromatin. Interestingly, this 
distinction is not only confined to cis interactions but can 
also be observed at the level of the whole genome. In 
addition, we suggest a further discrimination of heterochro- 
matic interactions. The purely heterochromatic viewpoint 



hk4s predominantly interacts with visible heterochromatin 
such as the pericentromeric regions. PHEl, which shows 
moderate H3K27mel enrichment surrounding the view- 
point, interacts predominantly with heterochromatic 
islands within otherwise euchromatic regions (Figure 2, 
Figure 4C; see Additional file 4: Figure S4). 

Previous work in Arabidopsis has shown that homolo- 
gous pairing is decreased in hypomethylation mutants [41], 
indicating a role for cytosine methylation in long-range 
interactions. We observed significant differences between 
control and prey regions with respect to their CG, CHH, 
and CHG methylation densities. Additionally, transcription 
rates exhibited significant differences between prey and 
control regions. Whether transcriptionally active genes 
interact with each other is not clear, as the genes residing 
in our viewpoints were not evenly balanced with regard to 
their transcriptional state (active versus silenced), rendering 
them inappropriate for statistical analysis. 

Taking these results together, we conclude that interac- 
tomes share a common epigenetic landscape, leading to 
distinguishable heterochromatic and euchromatic interac- 
tomes. However, it is not clear to what extent individual 
epigenetic modifications influence the interactome, and 
to what extent the epigenetic landscape is the cause or 
consequence of a given interactome. 

The knob hk4s\ exception or rule? 

Finally, the knob hk4s appears as an exceptional feature 
within the Arabidopsis nuclear landscape, as it interacts 
predominantly with pericentromeric regions. We think that 
hk4s represents the exception that proves the rule because 
its interactome reflects the pericentromeric origin of hk4s, 
which arose by an inversion that placed a pericentromeric 
region into the center of the chromosome arm. As dis- 
cussed above, heterochromatic regions form a distinct 
interactome, in which heterochromatic islands that reside 
in an euchromatic environment are included. Figure 5C 
illustrates a model suggesting overall chromosomal archi- 
tecture and chromosomal looping of hk4s to the clustered 
centromeres. Our results indicate that the knob hk4s acts 
as an interaction insulator for its neighboring regions, and 
conserves its pericentromeric origin with respect to its 
interaction frequencies. 

To date, neither a functional role as a (neo) centromere 
nor an association with the nuclear matrix has been 
reported for hk4s. However, the specific interaction of 
hk4s with centromeres could raise speculation concerning 
the functional role of hk4s in the nucleus. The specificity 
of a given region to function as a centromere is surprisingly 
flexible. Previous reports show that in maize, centromere 
identity is not irreversibly defined. Wolfgruber and 
colleagues demonstrated that the centromere of maize 
chromosome 5 has moved to a new location, due to the 
invasion of non-centromeric retrotransposons, splitting the 
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centromere into two. Consequently, one of the two cleavage 
products lost its association with histone CenH3, which 
defines centromeres epigenetically by replacing the regular 
histone H3 protein [42]. In maize, centromere identity 
correlates with the abundance of centromeric retrotranspo- 
sons [43], which specifically invade centromeric regions. 
Nevertheless, centromere identity appears to be mainly 
controlled epigenetically and not by DNA sequence 
[44,45]. However, previous reports show that that histone 
CenH3 accumulation defines the functional centromere in 
Arabidopsis and that CenH3 is predominantly associated 
with the 178 bp centromeric repeats [46,47]. As the knob 
hk4s lacks the centromeric 178 bp repeats and is thought 
to originate from a pericentromic region, which is not 
associated with CenH3, we conclude that hk4s is mainly 
involved in heterochromatin formation, and that hk4s is 
unlikely to play a role as a (neo)centromere. 

Conclusions 

Centromeres are key elements for chromosomal organi- 
zation, as the position relative to the centromere strongly 
influences the interactome of a chromosomal region. 
We propose that the length of chromosome arms limits 
the mobility with which a region can traverse through the 
nuclear space and, therefore, influences the interaction 
potential in trans. Another hallmark of chromosomal 
architecture in Arabidopsis nuclei is the separation of two 
seemingly distinct interactomes, strongly correlating with 
visible heterochromatin and euchromatin. Interestingly, 
heterochromatic islands are partly able to evade their 
euchromatic context. The epigenetic landscapes of the 
heterochromatic and euchromatic interactome are clearly 
distinguishable. Therefore, histone modifications, which 
were previously described to be characteristic of chromatin 
states, may also be predictive for the interaction potential 
of a given chromosomal region. 

Materials and methods 

Nuclei extraction and 4C sample preparation 

Seedlings of Arabidopis thaliana (L.) Heynh, accession 
Columbia (Col-0), were grown for 14 days on MS plates 
(4.3 g/1 Murashige and Skoog salt (Carolina Biological 
Supply Company, Burlington, North Carolina, USA), 10 g/1 
sucrose (Applichem GmbH, Darmstadt, Germany), 7 g/1 
PHYTAGAR (Life Technologies Europe, Zug, Switzerland), 
pH5.6). Aerial tissue of seedlings was collected (approxi- 
mately 10 g per sample), and distributed evenly between 
four conical 50 ml tubes. Under vacuum, the seedlings 
were incubated for 1 hour at room temperature in 15 ml 
freshly prepared nuclei isolation buffer (NIB: 20 mmol/1 
Hepes (pH8), 250 mmol/1 sucrose, 1 mmol/1 MgCl2, 
5 mmol/1 KCl, 40% (v/v) glycerol, 0.25% (v/v) Triton 
X-100, 0.1 mmol/1 phenylmethanesulfonylfluoride (PMSF), 
0.1% (v/v) 2-mercaptoethanol) and 15 ml 4% formaldehyde 



solution, then 1.9 ml of 2 mol/1 glycine was added to 
quench the formaldehyde, and the mixture was incubated 
for another 5 minutes under vacuum. The seedlings 
were snap-frozen in liquid nitrogen, and ground to a fine 
powder. The powder from two initial tubes was pooled 
and suspended in 10 ml NIB, with added protease inhibi- 
tor (Complete Protease Inhibitor Tablets; Roche, Basel, 
Switzerland; two tablets in 150 ml NIB). The suspension 
was filtered twice through Miracloth (Calbiochem/EMD 
Milipore, Darmstadt, Germany) adding an additional 10 ml 
NIB. The filtered nuclei suspension was spun for 15 minutes 
at 4°C and 3000x^. The supernatant was discarded, and 
the pellet was resuspended in 4 ml NIB and transferred to 
two 1.5 ml reaction tubes. After the tubes were spun 
for 5 minutes at 4°C and 1900x^, the supernatant was 
removed, and the pellet was resuspended in 1 ml NIB, 
followed by centrifugation under the above conditions. This 
step was repeated twice. Then, the nuclei were washed 
twice with 1.2 x NEB buffer 4 (New England Biolabs, 
Ipswich, MA, USA) (10 x NEB buffer 4: 50 mmol/1 potas- 
sium acetate, 20 mmol/1 Tris acetate, 10 mmol/1 magne- 
sium acetate, 1 mmol/1 dithiothreitol (DTT)), using the 
centrifugation conditions described above. The nuclei were 
finally resuspended in 500 ml 1.2 x NEB buffer 4, with 5 \A 
of 20% SDS added. The samples were incubated for 
40 minutes at 65°C, followed by 20 minutes at 37°C under 
constant shaking, then 50 \A of 20% Triton X-100 were 
added. The mixture was incubated for 1 hour at 37°C under 
constant shaking, then 60 \A of sample was removed as a 
pre-digestion control. 

For digestion 15 \A 10 x NEB buffer 4 and 115 \A H2O 
were added to the samples, and digestion was started 
using 100 U of Hindlll restriction enzyme (New England 
Biolabs). After 3 hours of incubation at 37°C, 200 U of 
Hindlll were added, followed by overnight incubation at 
37°C. Next morning 100 U of Hindlll were added, and 
samples were incubated for a final 2 hours. An aliquot 
(80 \A) of the sample was transferred to a fresh tube, and 
kept aside as a post-digestion control. To inactivate 
Hindlll, 20 \A 20% SDS were added, and samples were 
incubated at 65°C for 25 minutes under constant shaking. 
Samples were transferred to 15 ml conical tubes, and 
700 (il of lOx ligation buffer (0.5 mol/1 Tris-Cl, 0.1 mol/1 
MgCl2, 0.1 mol/1 DTT, pH 7.5), 375 (il of 20% Triton 
X-100, and H2O to a final volume of 7 ml was added, 
followed by 1 hour of incubation at 37°C under constant 
shaking. 

Ligation was performed by adding 70 \A of 100 mmol/1 
ATP (Roche) and 50 Weiss Units (WU) of DNA Ligase 
(Fermentas/ThermoFisher, Waltham, USA). The sample 
was incubated for 5 hours at 16°C. During incubation, 
additional 10 WU of DNA ligase were added. Following 
ligation, 30 \A 10 mg/ml proteinase K (Qbiogene; MP 
Biomedicals, Santa Ana, CA, USA) were added, and the 
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sample was incubated overnight at 65°C. Next morning, 
30 [A of 10 mg/ml RNase A (Roche) were added, and the 
sample was incubated for 30 minutes at 37°C. 

The DNA was purified by two chloroformiphenol ex- 
tractions, followed by ethanol precipitation using 1 ml 
3 mol/1 sodium acetate, 7 ml H2O and 25 [A glycogen, 
taken up to a final volume of 50 ml with ice-cold ethanol 
The mixture was kept overnight at -80°C. The pellet was 
finally resuspended in 150 [A H2O. 

Pre-digestion control, post-digestion control, and the 
final 3C sample (120 ng of DNA each) were analyzed on 
1.5% agarose gels. Samples with satisfactory digestion 
were then pooled to proceed further. 

The 3C samples were digested with a final quantity of 
0.2 \J/[A of the secondary restriction enzymes Dpnll or 
Nlallly respectively (New England Biolabs). The 4C digested 
samples were analyzed on an agarose gel. For the 4C 
ligation, 700 \A of T4 Ligase Buffer (Fermentas/Thermo- 
Fisher), 70 \A 100 mmol/1 ATP, and 50 WU of DNA Ligase 
(Fermentas/ThermoFisher), were taken up to 7 ml with 
H2O; this mixture was added to the samples, and the 
ligation reaction was incubated for 5 hours at 16°C. Finally, 
the samples were purified by phenolxhloroform ex- 
traction, followed by ethanol precipitation, and stored 
at -20°C. 

For each viewpoint, 16 PCRs (for detailed PGR conditions 
and primer sequences, see Additional file 17: Table SI) 
were set up, using 30 ng of 4C template for each reaction. 
For ease of later Illumina library preparation, primers of a 
subset of samples were designed with an Illumina sequen- 
cing adapter tail (batch 1: MEA F6, MEA F8, PHE, FIS2, 
CKIh FWA, AG, PLC), For all other samples (batch 2: 
AT1GS1860, AT3G44380, SWN, hk4s, YAO), Illumina 
sequencing adapters were ligated later in the library 
preparation process. 

An aliquot of each PGR product was analyzed on an 
agarose gel, and the remaining PGR product was purified 
using the QIAquick PGR Purification Kit (Qiagen, Hilden, 
Netherlands), following the manufacturer s protocol. 

Library preparation 

Hereafter, library preparation is described for samples 
that had no Illumina (Illumina, San Diego, GA, USA) 
adapter attached to the 4G primer. Samples of each 
replicate were pooled in equimolar amounts, and assessed 
on a Bioanalyzer (Agilent Technologies, Santa Glara, GA 
USA). Finally, each sample volume was adjusted to 100 [A 
using H2O. Replicates were then split into two aliquots 
of 50 [A each, and 10 [A of Resuspension Buffer (RSB; 
Illumina) and 40 \A End-Repair Mix (ERP) (Illumina) 
was added. The mixture was incubated for 30 minutes at 
30°G. Then, 100 [A of Agencourt AMPure beads (Beckman 
Goulter, Brea, GA, USA) were added, and the mixture 
was incubated for 15 minutes at room temperature. The 



reaction tubes were then placed on a magnetic stand. The 
supernatants were removed without disturbing the beads, 
and 400 [A of freshly prepared 80% ethanol was added. 
After 30 seconds, the ethanol was replaced with another 
400 [A of 80% ethanol. The supernatant was removed, 
and the tubes were left open to dry. The beads binding the 
4G PGR products were resuspended in 17.5 [A RSB, and in- 
cubated for 2 minutes before being placed on a magnetic 
stand for 15 minutes. Finally, 15 (il of sample was trans- 
ferred to a fresh 0.2 ml reaction tube. To each sample, 
2.5 [A of RSB and 12.5 [A A-tailing Mix (ATL) (Illumina) 
were added and mixed thoroughly, followed by incubation 
at 37°G for 30 minutes. Following this, 2.5 [A of RSB, 2.5 [A 
of DNA Ligase Mix (LIG) (Illumina) and 2.5 [A of indexed 
DNA adapters (Illumina) were added, and mixed gently by 
pipetting the mixture up and down. Subsequently, the mix- 
ture was incubated for 10 minutes at 30°G. To inactivate 
the reaction 5 [A of Stop Ligase Mix (STL) (Illumina) 
were added, and samples were transferred to a fresh 
1.5 ml reaction tube. Then 42.5 [A of Agencourt 
AMPure beads (Beckman Goulter) were added to each 
tube, and the mixture was incubated for 15 minutes at 
room temperature. The tubes were subsequently placed on 
a magnetic stand for 2 minutes, then 80 [A of supernatant 
were removed and replaced with 200 [A of freshly prepared 
80% ethanol. After incubation for 30 seconds, the super- 
natant was removed, and the tubes were left open to dry. 
The previous ethanol washing step described above was 
repeated once, then, the pellet was resuspended in 52.5 [A 
RSB. After 2 minutes of incubation at room temperature, 
tubes were placed on a magnetic stand for 2 minutes, then 
50 [A of the supernatant were transferred to a fresh 1.5 ml 
reaction tube. The Agencourt AMPure (Beckman Goulter) 
cleanup was repeated once; however, at the final step, 
instead of being suspended in 52.5 [i\ RSB, the pellet 
was resuspended in 22.5 [i\ RSB, of which 20 [i\ were 
transferred to a fresh 0.2 ml reaction tube. Samples 
with adapters already attached to the 4G PGR primers 
were treated in the same way from this point on. To 
perform final library amplification, 5 \A of PGR Primer 
Gocktail (PPG) and 25 \i\ of PGR Master Mix (PMM) 
(both Illumina) were added to each tube. PGR was per- 
formed under the following conditions: 98°G for 30 seconds; 
then 12 cycles of 98°G for 10 seconds, 60°G for 30 seconds, 
and 72°G for 30 seconds; followed by a final elongation at 
72°G for 5 minutes. Samples were then transferred to a 
1.5 ml reaction tube, and 50 ml of Agencourt AMPure 
beads (Beckman Goulter) were added. After 15 minutes of 
incubation at room temperature, the tubes were placed on 
a magnetic stand for 2 minutes. Following this, 95 [A of 
supernatant were removed, and the beads were washed 
twice with 200 \A of freshly prepared 80% ethanol. After 
the supernatant was removed, tubes were left open to dry. 
The pellet was then resuspended in 32.5 [A RSB and 
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incubated for 2 minutes at room temperature. The tubes 
were placed on a magnetic stand, and 30 [A of the purified 
library were transferred to a fresh 1.5 ml reaction tube. 
From each library a 10 nmol/1 stock in Tris-Cl (pH 8.5) 
with 0.1% (v/v) Tween 20 was prepared. All replicates 
in the libraries were subsequently pooled, and used for 
Illumina HiSeq 100 bp single end sequencing. For each 
batch of replicates, one lane per replicate was loaded 
(total of four lanes). Batch 1 replicate A had a total yield 
of 92,063,669 raw reads, with a mean quality score of 
35.35. Batch 1 replicate B had a total yield of 80,777,012 
raw reads with a mean quality score of 35.31; batch 2 
replicate A had a total yield of 43,296,252 raw reads 
with a mean quality score of 36.85; and batch 2 replicate 
B had a total yield of 55,187,969 raw reads with a mean 
quality score of 36.76. 

4C sequencing data pre-processing 

The two fastq files (one per replicate) were split into 
separate viewpoints according to the 4C primer se- 
quences and the Hindlll restriction pattern within the 
reads. No mismatches were allowed, and the remaining 
reads were discarded. After removal of primer and 
restriction site sequences, reads were trimmed to 30 bp 
and aligned to the Arabidopsis reference genome [48] 
using bowtie (version 0.12.7) [49] with the command 
line arguments -a -v 0 -m 25. For alignment statistics, 
see Additional file 17: Table S2. 

Reads with multiple alignments were processed as 
described previously [50]. Because we estimated the length 
of a single interaction unit as 100 kb, we used an allocation 
distance of ±50 kb. To specify potential 4C fragments, we 
generated an in silico Hindlll digest of the Arabidopsis 
Col-0 genome. Reads mapping to the ends of the resulting 
fragments were considered for further analysis. For a more 
robust measure of interactions, fragments were then used 
to generate windows spanning a larger region of the 
genome (that is, 100 fragments, corresponding to 180 kb 
on average). During this process, fragments closer than 
1 kb to the viewpoint were discarded, given that a large 
proportion of their reads would probably originate from 
incomplete digestion and/or self-circularization. Further- 
more, we discarded all fragments closer than 100 kb to a 
centromere, as the quality of alignments to centromeres is 
low. Finally, fragments whose distance from the primary 
restriction site to the first occurring secondary restriction 
site was 1000 bp or more with respect to both ends of the 
fragment were also removed. As a measure of interaction 
of a given window (interaction value), fragment counts 
were log-transformed to avoid high impact of outlier frag- 
ments, and then summed. Depending on the downstream 
analysis, windows spanned either 100 fragments from 
each fragment on (overlapping) or 25 fragments starting 
from every 25th fragment (non-overlapping). 



Processed 4C data files (split according to primer 
sequence) and raw-data sequencing files are publically 
available on Gene Expression Omnibus (GEO), accession 
number GSE50181. 

Data processing of histone modifications, transcription, 
DNA methylation, and genomic sequencing 

To add additional information, such as histone modification 
patterns and transcription rates, we obtained publicly 
available data from GEO [51], specifically ChIP sequencing 
(ChlP-seq) data GSM701923, GSM701924, GSM701925, 
GSM701926, GSM701927, GSM701928, GSM701929, GSM 
701930, GSM701931 [30], and RNA-seq data GSM701934 
[30]. Pre-processed DNA methylation data was obtained 
from [32]. 

ChlP-seq and RNA sequencing (RNA-seq) reads (SOLID 
sequencing, 50 bp (Applied Biosystems/Life Technologies) 
were aligned to the Arabidopsis reference genome (Col-0, 
TAIRIO [52]) using bowtie (version 0.12.7) with the follow- 
ing command line arguments: -a -v 2 -m 25. Reads with 
multiple alignments were processed as described previously 
[50]. Allocation distances were set to ±5 kb and ±50 bp for 
the ChlP-seq and the RNA-seq data, respectively. Histone 
modification densities and DNA methylation densities were 
calculated by the sum of nucleotides covered by at least one 
uniquely alignable short sequence, divided by the total num- 
ber of nucleotides for each individual 4C restriction fragment. 

To estimate potential biases related to sequence compos- 
ition (such as repetitive sequences), we obtained genomic 
DNA sequencing data (Illumina, 100 bp) of the data set 
GSM567816, and processed them identically to the 4C 
sequencing data. 

Assigning P-values to individual windows 

To estimate the significance of an interaction, we calculated 
for each window the probability (that is, P-value) to observe 
its interaction value by chance. Given that an interaction 
of two fragments would lead to a higher read count in 
the neighboring fragments as well (hence in the window), 
random shuffling of fragment positions and recalculation 
of window interaction values provides randomized inter- 
action data with the values following a normal distribution. 
Using the parameters of this distribution, a preliminary 
P-value was then calculated for each window. We repeated 
this process 1,000 times, and averaged for each window 
the P-values from all individual repetitions to obtain a final 
P-value. To take into account the differences between 
chromosome arms (for example, the different amount 
of DNA between the short arm and the long arm of 
chromosome 2), the P-values were calculated for each 
chromosome arm separately. 

P-value thresholds were chosen to best fulfill the require- 
ments of either plotting or data analysis. Generally, we set 
the threshold for prey regions to 10'^. In the Circos plot of 
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Figure 5A we chose P < 10'^ for better visibility. Because 
for various viewpoints, a threshold of 10'^ did not yield a 
sufficient number of prey regions for robust data analysis, 
we chose a threshold ofP< 0.05 to perform PCA. 

Distance decay 

We estimated the genomic distance-dependent decay of 
the interaction probability on a distance of 1 kb to 10 Mb 
from the viewpoint. This stretch was log-transformed, and 
split into 41 intervals with length of 0.1 (on the log scale). 
For each sample, the reads of the fragments corresponding 
to the intervals were summed up and assigned to the inter- 
val. Given that the centromere acts as an interaction 
boundary, only fragments on the viewpoint's arm were 
considered. Read counts per interval were then divided by 
the total number of reads across all intervals representing 
contact probabilities, which across the full distance add up 
to 1. Given that some intervals contained only a few frag- 
ments and, in certain cases, only fragments from a subset 
of the viewpoints, we used a locally weighted scatterplot 
smoothing (LOESS) predictor fitted to the original data to 
calculate one single contact probability value for each inter- 
val. To obtain the slope, and hence the distance decay coef- 
ficient, we then approximated the data with a linear model. 
Slope and P-value were derived from the fit of the linear 
model to the values predicted by the LOESS fit. However, 
direct fitting of a linear model to the original data yielded 
almost equal results with a slope of -0.72 instead of -0.73, 
and an extremely low P value (<10'^^^). 

Centromere distance 

To analyze the effect of a viewpoint's distance to the 
centromere on the distribution of the observed interaction 
frequencies along chromosome arms, we calculated for 
each chromosome arm (except the viewpoint's arm) 
the distance to the centromere at which 50% of all 
reads were aligned, and then fitted a linear model. The 
procedure was performed twice, first using absolute 
values, and then relative distances, defined as the absolute 
distance divided by the length of the chromosome arm 
(transformed by taking the arcsine of the square root). 

Principal component analysis 

All PCAs were based on non-overlapping windows that 
included 25 fragments. For each viewpoint, mean prey and 
control histone densities for each histone modification 
(that is, EMD) were calculated. Subsequently, PCA was 
performed on a dataset including mean EMD values of con- 
trol and prey regions for each viewpoint and EMD. PCA 
was performed using the built-in R princompO function. 

Permutation test 

To analyze differences in the epigenetic landscape of prey 
and control regions, we randomly selected 50 prey and 50 



control regions (sampled) for each viewpoint, and obtained 
a corresponding randomized test set by pooling their EMDs 
and permuting them (shuffling them into two randomized 
groups of 50 values each). We then calculated the absolute 
differences in averaged EMDs between the sampled (Real- 
Diff-j), and the permutated (RandDiffij) prey and control 
regions, respectively. 

Repeating this step / times for each of the ; viewpoints 
yielded an empirical distribution for RandDiff for every 
epigenetic modification with 13,000 values (j = 13 view- 
points, and i = 1,000 repetitions). Comparing the average 
RealDiffm (mean across all repetitions and viewpoints) 
with this distribution then provided an empirical P-value 
(p = Z(RandDifiij > RealDiffm)/(i*j)), which was subsequently 
adjusted for multiple testing calculating false discovery 
rate (FDR; Benjamini-Hochberg). 

Analysis of individual epigenetic marks employing 
GSEA-like analysis 

To test whether prey regions have a different epigenetic 
landscape from that of regions chosen randomly across 
the genome, we developed a procedure similar to the 
GSEA described previously [33]. It requires densities of 
EMDs (for example, CG methylation density or H3K9me2) 
assigned to all (n) regions in the genome (that is, non- 
overlapping windows spanning 25 restriction fragments), 
and a subset (m) of the regions as a test set (that is, prey 
regions with a P<0.01 in both replicates). During the 
procedure, the regions are first sorted according to their 
EMD. We then assigned a value of -1 to regions not in 
the test set, and a value of {n-m)/m to the regions in the 
test set (to assure that the sum of these values across all 
regions would be zero). In a third step, the cumulative 
sum of these values was calculated and the enrichment 
score (ES) was defined as the maximum (absolute) devi- 
ation from zero. If the regions in the test set were randomly 
distributed across the sorted list of all regions, the cumula- 
tive sum would fluctuate around zero with a relatively small 
ES. Conversely, a non-random distribution of the test set 
(for example, accumulation at one end of the sorted list) 
would lead to a high ES. A P-value could then be assigned 
by comparing an observed ES to an ES distribution ob- 
tained by randomly choosing m regions 10,000 times. 
To obtain one P-value per epigenetic feature, the ES were 
averaged across all viewpoints. As we were focusing on 
long-range interactions, we excluded all interactions within 
the viewpoints arm. Because statistical testing for all 
epigenetic features was employed, using the same 4C data, 
P- values were adjusted for multiple testing, calculating 
FDR (Benjamini-Hochberg). 

Plotting 

All plotting of 4C data, genomic features, and histone 
modification data was performed using either Circos 
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[23] or built-in R functions [53] plotting. Code is available 
upon request. 

Data availability 

All sequencing data and processed 4C files are available 
on Gene Expression Omnibus (GEO) accession number 
GSE50181. 

Additional files 
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